Human strategy updating in evolutionary games 
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Evolutionary game dynamics describes not only frequency dependent genetical evolution, but also 
cultural evolution in humans. In this context, successful strategies spread by imitation. It has been 
shown that the details of strategy update rules can have a crucial impact on evolutionary dynamics in 
theoretical models and e.g. significantly alter the level of cooperation in social dilemmas. But what 
kind of strategy update rules can describe imitation dynamics in humans? Here, we present a way to 
measure such strategy update rules in a behavioral experiment. We use a setting in which individuals are 
virtually arranged on a spatial lattice. This produces a large number of different strategic situations from 
which we can assess strategy updating. Most importantly, spontaneous strategy changes corresponding 
to mutations or exploration behavior are more frequent than assumed in many models. Our experimental 
approach to measure properties of the update mechanisms used in theoretical models will be useful for 
mathematical models of cultural evolution. 



Classical game theory assumes that agents make rational 
decisions, taking into account that they are interdependent 
with other agents that are also fully rational [lj. While this 
assumption has proven to be problematic even in humans, 
evolutionary game theory has been developed to describe 
the dynamics of genetical or cultural evolution when fitness 
is not fixed, but depends on the interactions with others. 
Applications of this framework range from the dynamics of 
microbes [2HI] to animal behavior [5] [5] and human be- 
havior [IH9]- Many aspects of evolutionary dynamics hinge 
upon the microscopic rules describing how successful strate- 
gies spread. In particular in structured populations, these 
rules can crucially alter the evolutionary outcome and, for 
example, determine whether cooperation evolves or not |10l - 
[12] . Thus, it is of great importance to infer how strategies 
are actually adopted. To this end, we have developed a be- 
havioral experiment that mimics typical properties of the- 
oretical models, but replaces the computer agents by real 
human players. Each player interacts only with her imme- 
diate neighbors. To evaluate his performance, each player 
can compare his payoff to the payoff of the neighbors and 
use this as a basis to adopt new strategies. However, there 
are some subtle differences between mathematical models 
and human behavior: Humans may use mixed strategies, 
i.e. randomize between their options, or even change their 
strategies over time, whereas most theoretical models con- 
sider the simplest case in which a player's strategy is equated 
with his action. Thus, any change in behavior is equated 
to a change in the strategy. If we aim to apply this sim- 
ple framework of one-shot games as a first approximation 
to describe human behavior, we have to infer the details 
of strategy adoption, e.g. the rate of spontaneous strategy 
changes. We utilize a spatial game in which human players 
are interacting with their immediate neighbors only. This 
leads to a large number of different strategic situations that 
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allows us to infer under which circumstances a neighboring 
strategy is adopted. 

A large portion of the literature on evolutionary games 
focuses on the Prisoner's Dilemma. This is a paradigm 
to study the evolution of costly cooperation among selfish 
individuals, because it highlights the potential differences 
between individual interests and the social optimum [131 - 
H7] , In the Prisoner's Dilemma, two players have to de- 
cide simultaneously whether to cooperate with the other or 
not. If both players cooperate, they obtain a reward R. If 
one defects while the other cooperates, the defector gets T 
(temptation to defect) and the cooperator obtains S (suck- 
ers payoff). If both defect, they get a punishment P. This 
can be summarized by the payoff matrix 
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The Prisoner's Dilemma is characterized by the payoff rank- 
ing T > R > P > S (and in addition 2R > T + S for 
repeated games). In this case, rational individuals choose 
defection: They are greedy and try to exploit other coopera- 
tors (T > R), but they also fear that the other one will try to 
exploit them (P > S). However, since mutual cooperation 
yields a higher payoff than mutual defection (R > P), play- 
ers face a dilemma: Individual reasoning leads to defection, 
but mutual cooperation implies a higher payoff. Similarly, in 
an evolutionary setting the higher payoff of defectors implies 
more reproductive success and thus cooperation should not 
evolve. However, cooperation can evolve for example by kin 
selection, spatial structure or when interactions are repeated 
|18| . There is a large body of literature on behavioral ex- 
periments based on the repeated Prisoner's Dilemma, see 
e.g. It is clear the humans behave in a more so- 

phisticated way than simple computer programs [19J, but 
it has also been shown that working memory constraints 
limit human behavior in repeated games [21J. Nonetheless, 
with few exceptions, see e.g. [22], theorists have focused 
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on simple forms of strategy choice, e.g. to disentangle the 
effects of population structure and game characteristics. In 
particular, the spatial version of the Prisoner's Dilemma has 
been analyzed in great detail by theorists [2~3Tl27| . Initially, 
research has focused on simple lattices that approximate 
interactions in spatially homogeneous systems. More re- 
cently, many studies have addressed complex social net- 
works instead [28] [29]. Typically, players are arranged on a 
social network and interact pairwise only with their immedi- 
ate neighbors, choosing either cooperation or defection for 
all interactions. In each round, the payoff of every player is 
accumulated in pairwise encounters with all its neighbors. 
Individuals with high payoffs are either imitated more often 
than others (in social models) or produce more offspring (in 
genetic models). The dynamics in spatially structured pop- 
ulations depend crucially on the details of the microscopic 
rules by which the players update their strategies. Our goal 
is to shed some light onto these microscopic rules that de- 
scribe how players change their strategies. 

Such a behavioral experiment with humans can only be 
done in comparably small systems due to some restrictions 
in experimental games that are absent in mathematical 
models. For example, participants have to be paid in real 
money and their anonymity must be guaranteed such that 
the results are not blurred by potential reputation effects. 
Throughout this study, we focus on R = 0.30 € , S — 0.00 
€ , T = 0.40 € , and P = 0.10 € . This leads to the 
2x2 payoff matrix 
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Players were virtually arranged on spatial 4x4 lattice with 
periodic boundary conditions, which corresponds to the sur- 
face of a torus. The participants had four fixed neighbors 
throughout the entire game. Thus, the possible cooperator 
payoffs accumulated in their four interactions are 0.00 €, 
0.30 €, 0.60 €, 0.90 €, and 1.20 €. A defector has the 
possible payoff values 0.40 € , 0.70 € , 1.00 € , 1.30 € , and 
1.60 €. 

Many theoretical studies are based on synchronous updat- 
ing, which means that all players make strategy revisions at 
the same time. This can easily be mimicked in behavioral 
experiments. However, the way that strategies are changed 
is more difficult to address. A typical assumption is that 
each player chooses the strategy that obtains the highest 
payoff in the neighborhood, either his previous strategy or 
a different one. In our experiment, players have many dif- 
ferent possibilities for strategy updating. It is clear that 
human players sometimes do not follow this "imitate the 
best" rule, but choose their strategies in a different fash- 
ion. Nonetheless, this imitation dynamics can serve as a 
first approximation for strategy updating. 

More recent studies have stressed that strategy adoption 
is stochastic, which can be modeled introducing an intensity 
of selection [30 , 31] ■ One possibility is the following imita- 
tion process with errors: Each player compares his payoff 




to the best performing neighbor that has played a differ- 
ent strategy and calculates the payoff difference Air. With 
probability p = (1 + cxp[— /3Att]) , he adopts the neigh- 
bors strategy [32-34J. Here, j3 measures the intensity of 
selection, i.e. how important the payoffs are for strategy 
revisions. In our case with two strategies only, this is equiv- 
alent to the multinomial logit model [331 136|. Our goal is 
to understand which strategy adaption rules can describe 
human behavior in this game. 



I. RESULTS 

Let us first address if imitation dynamics can describe 
human strategy updating. In total, we have 5760 individ- 
ual decisions to keep a strategy or to switch it. As a first 
model, we assume that all individuals use the imitate the 
best rule, i.e. they always imitate the best performing neigh- 
bor strategy, including their own. It has been shown that 
this cannot fully describe human behavior [37]. Fig. [TJ re- 
veals that in our experiment, initially 62% of the individuals 
follow the imitate the best rule. However, the remaining 
38% of the strategy changes cannot be explained by pure 
imitation. This fraction tends to decrease over time in the 
experiment. Fitting an exponential function to the data 
from Fig. [T] reveals that the fraction of strategy choices that 
are not explained by imitation decreases roughly by 4% per 
round. This reflects the fact that strategy choice changes 
over time in our behavioral experiment and that a stationary 
state is not reached. 

In theoretical models of the spatial Prisoner's Dilemma, 
one is typically interested in the average level of cooperation 
of the system. The idea is that in a spatial setting, clusters 
of cooperators can form, leading to a significant degree of 
cooperation [XT] [T2] [23l [24] . To explore how the level of co- 
operation is affected by spatial structure, we have also con- 
ducted a control experiment in which the spatial structure 
was broken up by reassigning each player's neighbors each 
round. Since in the spatial treatment individuals interact al- 
ways with the same co-players and can form stable clusters 
of cooperators, one would expect a higher level of cooper- 
ation in the fixed-neighbors than in the random-neighbors 
treatment. As described in previous human behavioral ex- 
periments [38J (and not necessarily in line with the expecta- 
tions of theoreticians), the average level of cooperation at 
the start of the experiment is comparably large and very sim- 
ilar in the treatment with fixed neighbors (70.0%, averaged 
over 15 repeats) and the treatment with random neighbors 
(70.6%, averaged over 10 repeats). Most interestingly, we 
do not find a significant difference in the level of cooperation 
during the course of the game between the two treatments, 
see Fig. [2] Only in round 4, there is a significant differ- 
ence between the levels of cooperation, which disappears 
after Bonferroni correction for multiple comparison. Sta- 
ble clusters of cooperators are not found in our behavioral 
experiments. The high probability of spontaneous strategy 
changes decreases the influence of spatial structure. 

It turns out that the dynamics can be explained based on 
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FIG. 1: Strategy updating in behavioral experiments with 
fixed neighbors. Most strategy changes can be explained 
by imitation of the most successful neighbor (full bars), i.e. 
changing to the best available strategy in a heterogeneous en- 
vironment where neighbors play different strategies (brown) 
or sticking to the strategy when everyone does the same in 
a homogeneous environment (orange). However, a large por- 
tion of strategy changes cannot be explained by imitation 
(open bars). These are either spontaneous strategy switching 
in homogeneous environments with no role model (open or- 
ange bars) or choosing a strategy that did not perform best 
in an environment with different neighboring strategies (open 
brown bars). The line shows a fit of the fraction of strat- 
egy changes not explained by imitation. This fraction decays 
approximately exponentially as vq ■ r t_1 . A nonlinear regres- 
sion leads to v = 0.380 ± 0.013 and T = 0.962 ± 0.003 (full 
line). The diagrams on the right show an example for a het- 
erogeneous (top) and a homogeneous environment (bottom) 
of a focal cooperating player. In total, we have 4315 hetero- 
geneous situations and 1445 homogeneous situations in our 
5760 strategy choice situations (graphic shows averages over 
15 fixed neighbor treatments with 25 rounds and 16 player 
each) . 

the way that our subjects revise their strategies. The gen- 
eral dynamics of the system can be captured by a simple 
random strategy choice approach [39J. We assume that a 
player can do two things when revising her strategy (i) with 
probability v, she chooses a random strategy, and (ii) with 
probability 1 — v, she imitates her best performing neigh- 
bor. In our behavioral experiment, we find that v decays 
exponentially with the round t of the game as v = ^oT*" 1 . 
Such an exponential decay of exploration rates has been 
reported before [40j. Our experiment yields for the best 
fit vq = 0.380 and T = 0.962. To test our assumption, we 
simulated the temporal dynamics of 15 runs under imitation 
dynamics with four neighbors, fitting the strategy choice pa- 
rameters to the experiment. In order to be consistent with 
random strategy choice, we assume that only a fraction of 
1 — 2v is correct imitation. A fraction v is random strategy 
choice leading to the "correct" strategy that is consistent 



FIG. 2: The average level of cooperation tends to decrease 
over time. Symbols show a behavioral experiment with hu- 
mans and lines correspond to simulations. In the experi- 
ment, the treatment with fixed neighbors on a 4 x 4 lattice 
with periodic boundary conditions (squares) is not signifi- 
cantly different from the dynamics in a system with random 
neighbors (triangles). Full lines show computer simulation 
in which players either imitate their best performing neigh- 
bor or choose a random strategy with probability 2u ■ r* , 
where v = 0.38 and Y = 0.96 (fitted to the behavioral experi- 
ment, see text). For such high probability of random strategy 
choice, the simulation results for fixed and random neighbors 
are almost indistinguishable, the level of cooperation is driven 
by random strategy choice rather than by spatial structure. 
For comparison, dotted lines show computer simulations with 
no mutations (experimental average over 15 repeats for fixed 
neighbors and 10 repeats for random neighbors, each with 16 
players; simulations starting from the cooperation level of the 
experiment, averaged over 10 4 realizations). 

with imitation and a fraction v are strategy changes not 
expected from imitation. Fig. [2] reveals that this approach 
can capture the average cooperation level in the behavioral 
experiment. Comparing 15 simulations with 15 experimen- 
tal treatments reveals no significant difference between the 
simulations and the experiments after Bonferroni correction 
which takes into account multiple comparison. We can sum- 
marize this approach by the following equation governing 
strategy choice, 

p A ^B = v^ 1 - 1 + (1 - 2 I / r*- 1 ) Q{tt b - TT A ), (3) 

where B is the best performing neighbor of A, t is the 
round of the game, tta and ttb are the payoffs of A and B, 
respectively, and <d(x) is the Heaviside function (0(.t) = 
for x < and Q(x) — 1 for x > 0). In our experiment, we 
find v = 0.380 ±0.013 and T = 0.962 ±0.003, see Fig. [l] 
Next, let us abstract from the fact that strategy adoption 
changes over time and analyze the way in which individuals 
imitate their co-players in more detail. First, we analyze all 
situations in which players do the same as their four neigh- 
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bors. How likely are they to switch strategies? It turns out 
that cooperators switch to defection in such a homogeneous 
environment with probability fic = 0.28 ± 0.07 (averaged 
over 45 such situations). Defectors switch to cooperation 
with probability /id = 0.25±0.01 (averaged over 1400 such 
situations). These probabilities correspond to spontaneous 
mutations or strategy exploration of the players. To analyze 
imitation is less straightforward, because it is impossible to 
say if people changed to a different strategy imitating a par- 
ticular neighbor, several ones at the same time, at random 
or based on some more sophisticated argumentation. For 
example, human players who find themselves in a neighbor- 
hood of cooperators may be tempted to defect, anticipating 
to win the highest possible payoff, before another neighbor 
defects. They may also expect others to take advantage 
of a cooperative neighborhood sooner or later. However, 
we can at least quantify the average behavior. We take all 
decisions into account in which a focal cooperator had at 
least one defecting neighbor (1524 decisions) or in which a 
focal defector has a least one cooperating neighbor (2791 
decisions). Some of these strategy changes will again corre- 
spond to random strategy exploration, but we can assume 
that this occurs with a probability that is independent of 
the payoff difference. 

Depending on the payoff difference to the neighbor who 
performs best by using a different strategy than the focal 
player, what is the probability that the focal player switches 
to that other strategy? Fig. [3] shows that the probability 
increases with the success of the neighbor, as expected. A 
cooperator is typically confronted with defector performing 
better, while a defector can typically only choose to imi- 
tate a cooperator performing worse. Moreover, defectors 
are more resilient to change than cooperators. To model 
strategy changes, we assume that the probability to switch 
strategy is given by p = (1 + exp [—[3 An]) . Note that for 
/3 — s- oo, we recover the unconditional imitation from above. 
Fitting this function to the data shown in Fig. [3] leads to 
j3 = 1.20 ± 0.25. The error corresponds to the standard 
deviation in a binomial distribution, y/p(l — p)/n, where n 
is the number of samples. If we want to take the difference 
in strategy adoption of cooperating players and defecting 
players into account, we can also fit two different functions 
to the data, see Fig. [3] If we instead use the average payoff 
difference to players using a different strategy, we obtain 
f3 = 1.15 ± 0.23. Also in this case, defecting players seem 
to be more resilient to change. 

Fig. [3] also shows how the probability to cooperate de- 
pends on the number of cooperating neighbors. This does 
not take any payoffs into account and addresses wether 
players imitate the common rather than the more success- 
ful. It turns out that the probability to cooperate is below 
50% even when all neighbors are cooperating. Thus, in our 
experiment players do not only imitate the most common 
strategy, but decide for cooperation or defection in more 
complex ways. 

The intensity of selection measured in our experiments 
reveals that humans do not simply accept any strategy that 
is performing better than their strategy, as assumed by imi- 



tation dynamics. However, (3 is also so high that analytical 
results obtained under weak selection may not always ap- 
ply. Again, we can summarize our approach by a simple 
equation. If neglect temporal dependence, but take the dif- 
ferences between cooperators and defectors into account, 
we find 
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Our analysis leads to ^ c = 0.28 ± 0.07, /3 C = 0.67 ± 0.28 
and ac = —0.11 ± 0.23 for cooperating players and /.ip = 
0.25 ± 0.01, {i D = 0.99 ± 0.23 and a D = 0.79 ± 0.14 for 
defecting players. 



II. DISCUSSION 

As expected, players imitate others with probability in- 
creasing with the payoff difference. In evolutionary game 
dynamics, this corresponds to selection. But sometimes 
players switch spontaneously to a new strategy at random, 
which corresponds to a mutation. Our approach reveals 
that the probability of such random changes is much higher 
than typically assumed in theoretical models. 

Theoreticians are often interested in the dynamics for very 
large populations and not in finite size effects. However, 
considering large population is unfeasible in behavioral ex- 
periments, where many repeats are required. Moreover, our 
predecessors lived in small social groups and our behav- 
ior may have adapted to that situation. Regardless of the 
complexity of our modern society, human interactions oc- 
cur typically within small social groups even today. Most 
importantly, the way players choose strategies based on lo- 
cal information does not seem to be fundamentally different 
in larger systems [41J. Decision making in humans is cer- 
tainly a complicated process that goes far beyond the sim- 
ple models that are typically considered. However, we argue 
that important aspects of human behavior are not captured 
by the different mechanisms of imitation. Modeling these 
processes by random strategy choice can lead to very differ- 
ent dynamics in theoretical models and captures the general 
trend of the dynamics in our system, cf. Fig. [2] 

In our experiment, we have analyzed the simplest system 
in which humans play a spatial game. Many challenges lie 
ahead: Theoretical models describe not only interactions on 
regular lattices, but also heterogeneous networks |42j , dy- 
namical networks [43] or set structured populations [ 44] . It 
would be fruitful to initiate a discussion in the scientific com- 
munity how such more complex models can be approached 
by behavioral experiments. 



III. METHODS 

From 2003-2004, voluntary human subjects for the ex- 
periment were recruited from first semester biology courses 
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at the Universities of Kiel, Cologne and Bonn. A total of 
400 students participated in the experiment. The students 
were divided into 25 groups consisting of 16 players each. 

In the spatial treatment (15 groups) the 16 subjects were 
virtually arranged on a spatial grid with periodic boundary 
conditions. This torus shaped geometry ensures that there 
are no edges in the system. Each subject had four fixed di- 
rect neighbors throughout the experiment (von-Neumann 
neighborhood). To ensure the players anonymity, each 
player was identified by a letter ranging from a to p (e.g. 
a has the following neighbors: b, d, e and to). The sub- 
jects would exclusively interact with these four neighbors 
and received no further information about the remaining 11 
subjects. In the non-spatial control treatment (10 groups) 
the 16 subjects where positioned on a new random position 
on the lattice in each round, such that the probability that 
another interaction with a particular co-player takes place 
is 4/15. Otherwise the control experiment was conducted 
exactly in the same way as in the spatial treatment. The 
students were fully aware of whether they were in a fixed or 
randomized neighborhood. 

The subjects started in both treatments without money 
on their account. Each group played a total of 25 pris- 
oners dilemma rounds, allowing them to earn on average 
between 10.00 €(for full defection) and 30.00 €(for full 
cooperation). A single player, however, may theoretically 
also obtain nothing (if the player always cooperates, but his 
four partners always defect) or up to 40.00 €(if the player 
always defects, but his partners always cooperate). 

Each subject had a decision box on his/her private table 
that was equipped with silent YES, NO and OK buttons. 
During a short oral introduction the subjects received infor- 
mation about the use of their decision box and how their 
anonymity would be ensured throughout and after the ex- 
periment. At the beginning of the experiment a written 
instruction explaining the game (see Supporting Informa- 
tion) was projected on a screen visible to all players. Each 
subject had to confirm via the OK button that he/she had 
finished reading and had understood each of the displayed 
instruction pages. 

In both treatments, each subject had to make a single 
decision in each round either cooperate or defect in the 
Prisoners Dilemma played with all four neighbors simulta- 
neously. This setting corresponds to synchronous strategy 



adjustment. After every round the subjects could observe 
the results of the round on their personal display which could 
display a maximum of 32 characters. Decisions were dis- 
played in the following form: 
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The display has been explained in detail in three examples 
and subjecs had no problems understanding it. Here, s, t, 
u, v and w are the codes for the different players. Each 
player is provided with his own strategy (cooperation, Y, 
or defection, N) and payoff as well as the chosen strategies 
of the direct neighbors and their respective payoffs, which 
resulted from their interactions with their 4 neighbors (e.g. 
own payoff of s: 6 = 0.60 €; payoff player t: 12 = 1.20 
€). The computer calculated the individual's payoff from 
all four encounters and transfered the cumulated payoff to 
the player's account after each round. At the end of the ex- 
periment the players received the money on their respective 
account in cash without losing their anonymity, see [45] for 
details. 

Throughout the experiment the complete anonymity of 
the subjects was assured by the following measures: Sub- 
jects were seated between separations, such that no visual 
contact between them was possible. All boxes were con- 
nected to a computer to record each individual decision. 
The subjects were informed that they were not allowed 
to talk or to contact each other during the experiment. 
Each player could only be identified by his pseudonym (a- 
p) both by other players as well as by the experimenters. 
Pseudonyms could not be connected with the students' real 
identity. 
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FIG. 3: Strategy updating in a spatial game, (a) As ex- 
pected, the probabilOity to switch to another strategy in- 
creases with the payoff difference. Theoretical models typ- 
ically assume strategy update functions such as e.g. p = 
(1 + exp [— /3A-K])" 1 , where p is the probability to switch 
strategy and Ait is the payoff difference. Fitting this function 
leads to an intensity of selection /3 = 1.20 ± 0.25 (full line). 
However, the data for cooperating and defecting players seem 
to follow different characteristics and defecting players seem 
to be more resilient to change than cooperators. To capture 
this, we have also fitted the two different data sets to the func- 
tion (1 + exp [— f3An + a]) -1 (dotted lines). This approach 
leads to /3 C = 0.67 ± 0.28 and etc = -0.11 ± 0.23 for cooper- 
ating players. For defecting players, we find fin = 0.99 ±0.23 
and an = 0.79 ± 0.14. The inset shows the probability 
to change strategies spontaneously, without any role models 
playing a different strategy. Such spontaneous changes cor- 
respond to mutations and occur with probability 0.28 ± 0.07 
(cooperating players switching to defection) or 0.25±0.01 (de- 
fecting players switching to cooperation). This probability is 
much higher in our experiment than typically assumed for 
theoretical models, but decreases exponentially in time (see 
Fig. [TJ . (b) Another perspective is to infer the probability to 
cooperate in the next round, given the number of coopering 
neighbors in the current round. This probability is highest if 
all neighbors cooperate, although in this case the payoff from 
defection would be highest. This indicates that humans do 
not only imitate what is successful, but also what is common 
(All error bars are the standard deviations of a binomial dis- 
tribution, \J p{l — p)/n, where n is the number of samples). 



